OPEN 3 ACCESS Freely available online 



•0-PLOS I o'^E 



Effects of GWAS-Associated Genetic Variants on IncRNAs (St\ 
within IBD and T1D Candidate Loci crossMark 

Aashiq H. Mirza^'^'^', Simranjeet Kaur^'^', Caroline A. Brorsson\ Flemming Pociot^'^* 

1 Copenhagen Diabetes Research Center (CPH-DIRECT), Department of Pediatrics E, Herlev University Hospital, Herlev, Denmarl<, 2 Faculty of Health and Medical Sciences, 
University of Copenhagen, Copenhagen, Denmark, 3 Center for non-coding RNA in Technology and Health, University of Copenhagen, Copenhagen, Denmark 

Abstract 

Long non-coding RNAs are a new class of non-coding RNAs tliat are at tlie crossliairs in many human diseases sucli as 
cancers, cardiovascular disorders, inflammatory and autoimmune disease like Inflammatory Bowel Disease (IBD) and Type 1 
Diabetes (Tl D). Nearly 90% of the phenotype-associated single-nucleotide polymorphisms (SNPs) identified by genome- 
wide association studies (GWAS) lie outside of the protein coding regions, and map to the non-coding intervals. However, 
the relationship between phenotype-associated loci and the non-coding regions including the long non-coding RNAs 
(IncRNAs) is poorly understood. Here, we systemically identified all annotated IBD and TID loci-associated IncRNAs, and 
mapped nominally significant GWAS/lmmunoChip SNPs for IBD and TID within these IncRNAs. Additionally, we identified 
tissue-specific c/s-eQTLs, and strong linkage disequilibrium (LD) signals associated with these SNPs. We explored sequence 
and structure based attributes of these IncRNAs, and also predicted the structural effects of mapped SNPs within them. We 
also identified IncRNAs in IBD and TID that are under recent positive selection. Our analysis identified putative IncRNA 
secondary structure-disruptive SNPs within and in close proximity (-1-/— 5 kb flanking regions) of IBD and Tl D loci-associated 
candidate genes, suggesting that these RNA conformation-altering polymorphisms might be associated with diseased- 
phenotype. Disruption of IncRNA secondary structure due to presence of GWAS SNPs provides valuable information that 
could be potentially useful for future structure-function studies on IncRNAs. 
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Introduction 

Long non-coding RNAs (IncRNAs) are recently discovered class 
of non-coding RNAs that are 200 nucleotides or longer in 
transcript length, are similar to protein-coding genes and are 
sometimes transcribed as whole or partial antisense transcripts to 
coding genes [1,2]. LncRNA genes are poorly conserved in 
sequence across different species and do not contain any conserved 
motifs [3-5]. Like other RNA species, IncRNAs also form 
secondary structures that play critical roles in functional mecha- 
nisms [6]. In recent years, IncRNAs have emerged as important 
regulatory players of gene expression. Increasing number of 
studies have implicated IncRNAs in a wide range of biological and 
ceUular processes including development [7], locahzation [8], 
alternative splicing [9], chromatin remodeling [10], cell cycle [11], 
survival [12], migration [13] and metabohsm [6]. Importantly, 
IncRNAs are known to modulate gene expression by both cis and 
trans acting manner, and broadly wield their effect by direct 
interaction with the chromatin-modifying proteins and transcrip- 
tion factors, promoter inactivation by binding to basal transcrip- 
tion factors and activation of an accessory protein, [1,14-18]. 
Furthermore, IncRNAs can also regulate gene expression at post- 



transcriptional level by binding to specific miRNAs and subse- 
quently preventing these miRNAs from binding to their target 
mRNA transcripts [19]. Antisense IncRNA transcripts have also 
been shown to control the transcription of protein-coding genes in 
CM [20,21]. 

The role of IncRNAs in various human diseases has recendy 
been collated and described elsewhere [22,23]. Accumulating 
body of evidence has linked mutations, alterations in the primary 
structure, secondary structure, and expression levels of IncRNAs to 
a number of human pathologies including autoimmune diseases, 
cancers and neurodegenerative diseases [13,15,16,22,24—27]. 
More recently, an elegant and comprehensive study explored the 
strand specific transcriptome of human pancreatic islets and beta 
cells, and identified 1128 islet specific IncRNA genes involved in 
beta cell differentiation and maturation [28]. Additionally, in 
recent years, genome-wide association studies (GWAS) have been 
salutary in identifying a large number of disease predisposing 
single nucleotide polymorphisms (SNPs) particularly in the 
autoimmune diseases [29-31]. Surprisingly, only a fraction of 
these identified variations are located within the protein-coding 
genes and a majority of these SNPs map to the non-coding 
intervals [32-34] including IncRNAs, and many of these genetic 
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variations are likely to have a rok; in gene regulation [35,36]. A 
powerful method to elucidate the genetic component underlying 
altered gene expression is mapping of expression quantitative trait 
loci (eQTL) [37]. eQTLs that map, and regulate nearby genes are 
referred to as CM-eQTLs. In contrast, eQTLs that map and 
regulate distant genes or on a different chromosome, are referred 
to as trans-eQTLs [37]. 

SNP-induced changes may affect transcription-factor binding 
sequences, translational efficiency and trans regulators such as 
IncRNAs and miRNAs. Recent evidence suggests that the disease- 
associated SNPs located within the regulatory regions of non- 
coding RNAs could potentially perturb the structural motifs and 
disrupt their function, and thereby lead to disease [38]. 
Nevertheless, underlying molecular mechanisms by which these 
genetic variations within the functional motifs of non-coding 
RNAs potentially affect their regulatory domains and cause 
abrogation of molecular interactions remains to be elucidated. 
Underpinning such mechanisms could be advantageous in 
unraveling the functional roles of IncRNAs and their-associated 
SNPs in disease context. 

Inflammatory bowel disease (IBD) and type 1 diabetes (TID) 
are immune-mediated diseases that share common susceptibility 
pathways and genes. Comparative analysis of susceptibility kx i 
between different immune-mediated disorders has delineated 
important insights into their common underlying genetic archi- 
tecture. Both, IBD and TID share multiple loci, however, often 
with contrasting ramifications. For example, a misssense SNP 
(R620W, rs24766()l) in protein tyrosine phosphatase non-receptor 
type 22 (PTPN22) has been shown to be associated with several 
autoimmune diseases including TID, rheumatoid arthritis and 
Crohn's disease (CD) but with opposite directions of association 
[39]. However, functional and structural consequences of auto- 
immune disease associated genetic variations located within the 
IBD and TID loci-associated IncRNAs have largely remained an 
elusive and unaddressed question. 

In the present study, we conducted systematic search for all 
annotated IBD and TID loci-associated IncRNAs, and demarcat- 
ed their sequence and structural based characteristics. We 
identified structure-disruptive SNPs within the linkage disequilib- 
rium (LD) blocks defined for the GWAS/ImmunoChip loci and 
predicted their effect on the IncRNA secondary structure. 
Furthermore, we also identified distinct tissue-specific expression 
patterns, CM-eQTLs signals, ENCODE based regulatory features 
and evidence for recent positive selection for the IBD and TID 
loci-associated IncRNAs. These findings suggest that single genetic 
risk variants located within non-coding regions could play 
regulatory roles and possibly alter function of IncRNAs by 
modulating their expression and spatial physiognomy which in 
turn could profoundly affect neighboring (cm) or distal (Irons) 
candidate genes. The overall pipeline employed for this study is 
outiined in Figure 1 . Although, we restricted our study to IBD and 
TID loci, this approach can be further extended for other 
autoimmune disorders. 

Results 

IBD and TID loci-associated IncRNAs 

Majority of IncRNAs are transcribed as complex interlaced 
networks sharing genomic sequences within number of different 
intersecting coding and non-coding transcripts in sense and 
antisense directions [40]. On dissecting the IBD and TID loci 
datasets, we found 3665 IncRNA genes intersect 1 1 68 IBD 
candidate genes (Table 1). Based on strand orientation (Figure 2), 
1440 IncRNA genes were found to be antisense to 750 IBD 



candidate genes. Also, 2133 IncRNA genes intersected 1004 IBD 
candidate genes with 100% cis-overlap on the same strand. 2245 
IncRNA genes intersected 1038 IBD candidate genes with an 
overlap of at least one nucleotide on the same strand. In case of the 
TID loci, 762 IncRNA genes intersected 660 TID candidate 
genes. Likewise, when taking strand orientation into account, 317 
IncRNA genes were found to be antisense of 297 TID candidate 
genes. Furthermore, 611 IncRNA genes intersected 473 TID 
candidate genes with 100% cis-overlap on the same strand. We 
also observed 690 IncRNA genes intersecting 579 TID candidate 
genes on the same strand with an overlap of at least one 
nucleotide. 

For IncRNAs located in the proximity of IBD and TID 
candidate genes, we screened 5 kb up/down-stream flanking 
regions to identify intergenic IncRNAs associated with IBD and 
TID loci. Regardless of the strand orientation, we identified 1131 
and 537 intergenic IncRNA genes 5 kb up/down-stream of 778 
IBD and 588 TID candidate genes, respectively (Table 1). While 
considering strand orientation, 752 and 383 antisense IncRNA 
genes mapped to 624 IBD and 422 TID candidate genes, 
respectively. Also, 395 and 344 IncRNA genes mapped to the 
same strand to 339 IBD and 364 TID candidate genes, 
respectively with an overlap of at least one nucleotide. 

In total, we identified 4272 and 816 IncRNA genes in and 
around 5 kb up/down-stream proximity of 1231 IBD and 761 
TID candidate genes, respectively. Furthermore, following the 
Noncodev4 classification criteria for IncRNA categories based on 
their genomic location in relation to all protein-coding genes, we 
identified 1692 sense exonic, 579 sense non-exonic, 1345 antisense 
and 767 intergenic IBD loci-associated IncRNAs. While in case of 
the TID loci-associated IncRNAs, we identified 381 sense exonic, 
53 sense non-exonic, 188 antisense and 225 intergenic IncRNAs. 

Sequence analysis of IBD and TID loci-associated 
IncRNAs 

The sequence features such as GC content and repetitive 
elements of coding and non-coding genes are well known 
attributes coupled to their biological functions. In vertebrates 
including humans, GC content is known to vary significantiy 
between different genomic regions [41]. Several studic-s have 
established that GC content is strongly associated with various 
genomic features like gene density, recombination rate, and 
distribution of repetitive elements within the human genome [42] . 
The averagi; k-ngth distribution (log transformed) of all IncRNA 
genes is shown in density plot (Figure SI). Our sc'qu(;nce analysis of 
IBD and TID loci-associated IncRNAs revealed that IncRNAs 
associated with the TID loci are shorter in length (average length 
= 11,784) as compared to both IBD loci-associated IncRNAs 
(average length = 23,285) as well as to total IncRNAs (average 
length = 15,929) (Figure S2). The overall length distribution of 
IBD loci-associated IncRNAs varied significantiy compared to the 
total IncRNAs (p-value < lOe-6, Welch two sample t-test). For 
IBD and TID candidate genes, the average GC content was found 
to be 42% and 47% respectively. The average GC content of all 
known IncRNA genes was found to be 42%. IBD and TID loci- 
associated IncRNA genes had an average GC content of 43.5% 
and 48% respectively (Figure S3). Our results showed significantly 
higher GC content in TID loci-associated IncRNA genes (p-value 
< l()e-6, Welch two sample t-test) as compared to the background 
(total IncRNAs). 

Furthermore, we also analyzed the relative abundance of 
various classes of repeat elements within all annotated human 
IncRNAs. Approximately, 81% of the total IncRNA genes (45,880 
IncRNA genes), were found to harbor repeat elements. It was 
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Figure 1 . Schematic outline of the analysis pipeline. The summary of the steps involved in this study. All human IncRNAs (sense exonic, sense 
non-exonic, antisense and intergenic) located within and in close proximity (5 kb up/downstream) of the IBD and TID candidate genes were 
identified. Based on the above described workflow, we predicted potential IncRNA secondary structure-disruptive GWAS SNPs within the IBD and T1 D 
loci-associated IncRNAs. cis-eQTL signals were identified for these IncRNAs and linkage disequilibrium analysis was also explored for selected SNPs. 
We exploited HapMap data to identify candidate IncRNAs under recent positive selection. 
doi:1 0.1 371 /journal.pone.01 05723.g001 
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Figure 2. Mapped IncRNA intervals within IBD and T1 D susceptibility loci genes. Categorization of the genomic association of the IncRNAs 
to the IBD and TID loci-associated genes, as sense exonic (A), sense non-exonic (B), antisense (C), and intergenic (D). 
doi:1 0.1 371/journal.pone.01 05723.g002 
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Table 1. IBD and TID candidate gene associated IncRNAs. 





IBD 


TID 




Category LncRNA genes IBD genes 


LncRNA genes 


T1D genes 


Exonlc/Non-Exonic and Antisense IncRNAs within IBD and T1D candidate genes 


Sense intersecting 2245 1038 


690 


579 


Sense 100% overlapping 2133 1004 


611 


473 


Antisense intersecting 1440 750 


317 


297 


Total 3665 1168 


762 


660 


Intergenic IncRNAs in 5 kb up/down-stream proximity of IBD and T1 D candidate genes 


Sense 395 339 


344 


364 


Antisense 752 624 


383 


422 


Total 1131 778 


537 


588 


Total mapped Exonic/Non-Exonic, Antisense and Intergenic IncRNA genes 


Overall Total 4272 1231 


816 


761 



IBD and Tl D candidate gene associated IncRNAs located in and around the (+/— ) 5 l<b up/downstream of start and end site of each candidate gene. Number of exonic/ 

non-exonic, antisense IncRNAs are described for both IBD and TID. 

doi:10.1371/journal.pone.0105723.t001 



observed that the interspersed repeat families, i.e. SINEs, LINEs, 
LTRs and DNA elements were the most abundant repeat classes 
within the IncRNA genes and constituted 85% of the total repeats 
(Figure S4 and S5). Overall percentage of interspersed repeat 
families observed within these IncRNA genes was 34%, 27%, 13% 
and 9% for SINEs, LINEs, LTRs and DNA elements, respectively. 
SINE repeat family was the most abundant repeat family in the 
IncRNAs, which is also the most abundant repeat family in human 
genome after LINEs. Interestingly, we observed very few low 
complexity regions and simple repeats within the IncRNA genes. 
We also compared the distribution of repeats within the IBD and 
TID loci-associated IncRNAs. Our analysis revealed that in IBD 
loci-associated IncRNA genes, almost 85% of the IncRNAs 
harbored repeat elements. The interspersed repeat elements 
(35% SINEs, 28% LINEs, 11% LTRs and 10% DNA elements) 
constituted 84% of the total repeats found within the IBD loci- 
associated IncRNAs. In case of TID loci-associated IncRNA 
genes, 79% of the IncRNA genes harbored repeat elements and 
interspersed repeat elements constituted 85% of the total repeats 
(39% SINEs, 26% LINEs, 10% LTRs and 10% DNA elements). 
The interspersed repeat element distributions for both IBD and 
TID loci-associated IncRNAs were found to significandy difiFerent 
(p-value < lOe-6, Chi-square goodness of fit test) than the 
background (total IncRNAs). In case of SINEs within the TID 
loci-associated IncRNAs, the absolute value of standardized 
residuals was found to be 12 (z score >1.96). These results suggest 
significant enrichment of SINEs in TID loci-associated IncRNAs. 

GWAS/lmmunoChip SNPs within IBD and TID loci- 
associated IncRNAs 

Since vast majority of GWAS signals map to the non-coding 
regions of the genome where they are known to play many 
regulatory roles [33,34], we mapped aU the nominally associated 
IBD and TID SNPs within the IBD and TID loci-associated 
IncRNAs. OveraU, 26,283 GWAS/ImmunoChip SNPs were 
retrieved for IBD and 18,416 GWAS SNPs for TID loci. AU 
these SNPs were first mapped to all the human IncRNA genes, and 
then to the IBD and TID loci-associated IncRNA genes. In case of 
all the human IncRNA genes, 7893 IBD SNPs were found to be 
present within 2523 IncRNA genes, whereas, 5273 TID SNPs 



were found to be present within 2235 IncRNA genes. For IBD 
loci-associated IncRNA genes, 2063 SNPs were found to be 
present within 468 IncRNA genes. For TID loci-associated 
IncRNA genes, 1045 SNPs were found to be present within 247 
IncRNA genes. Within the shared IBD and T ID-associated 
IncRNA genes, we found 44 common SNPs. The SNPs that 
mapped within the IBD and TID loci-associated IncRNAs were 
selected for further analysis to predict their potential to disrupt the 
IncRNA secondary structures. 

LncRNA structure-disruptive GWAS/lmmunoChip SNPs in 
IBD and TID 

Studies have demonstrated that SNPs residing in and around 
key regulatory region of the IncRNA genes in the genome are 
known to be significantly associated with the increased suscepti- 
bility to sundry diseases [36]. We focused on the impact of 
GWAS/ImmunoChip SNPs within IBD and TID loci-associated 
IncRNAs, and identified number of SNPs with significant 
propensity to disrupt secondary structure of the loci-associated 
IncRNAs. The SNP-induced structural perturbations within the 
IncRNAs could be disruptive for its molecular functions and 
therefore, could possibly contribute towards the disease pheno- 
type. From our analysis, we present a list of 362 and 1 78 structure- 
disruptive SNPs in the IBD and TID loci-associated IncRNAs 
respectively that were found to be causing significant secondary 
structure changes in their-associated IncRNAs with empirical p- 
value <0.2 (Table SI). For example, 2 SNPs (rs3757247 and 
rs597325) perturbs the secondary structure of the sense exonic 
IncRNA NONHSAG044354 associated with candidate gene 
BACH2 (Figure 3). 

Common structure-disruptive SNPs between IBD and 
TID loci-associated IncRNAs 

Considering the commonality of the associated risk loci between 
IBD and TID diseases, we searched for common structure- 
disruptive SNPs shared between their loci-associated IncRNAs. 
Indeed, we found seven structure-disruptive SNPs (rs5763746, 
rsl476514, rs41176, rs41158, rs3757247, rs597325 and rs602662) 
to be common and located in the same locus of IBD and T 1 D- 
associated IncRNAs (Table 2). Four of these SNPs (rs5763746. 
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Figure 3. Structure disruption of IncRNA NONHSAG044354-associated with S/1C//2 (implicated in both IBD and T1D) by GWAS SNPs 
rs3757247 and rs597325. The structure-disruptive effects of SNPs rs3757247 (A) and rs597325 (B) located in IncRNA NONHSAG044354 associated 
with candidate gene BACH2. (a) UCSC genome browser view showing the location of the predicted local region disrupted by the SNP. The color of 
predicted local region (green in this case) is based on the RNAsnp p-value (0.129 and 0.120 for rs3757247 and rs597325 respectively), (b) Minimum 
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free energy structures (MFE) of the global wild-type and mutant sequences displaying the secondary structure and the local region affected by the 
SNP position, colored green (wild-type) and red (mutant) (c) Dot plot representing the base pair probabilities of wild type and mutant RNA sequences 
corresponding to the predicted local region by RNAsnp. The upper and lower triangle of the matrix represents the base pair probabilities of wild-type 
(green) and mutant (red), respectively. 
doi:l 0.1 371 /journal.pone.01 05723.g003 



rsl476514, rs41176 and rs41158) were located within a single 
antisense IncRNA NONHSAG033653 . Interestingly, IncRNA 
NONHSAG033653 is in close proximity of the H0RMAD2 
(22ql2.2) candidate gene, which has been implicated in both IBD 
and TID. SNP rs41 158 was found to be located within the TID 
candidate gene MTMR3 . An index SNP rs6()2662 was found to 
be present within antisense IncRNA NONHSAG026183 and 
candidate gene FUT2 (19ql3.33) [71]. Two SNPs rs3757247 and 
rs597325 were located within hicRNA NONHSAG044354 and 
BACH2 gene (6ql5), an important candidate gene in both IBD 
and TID (Figure 3). Investigating the pattern of LD across the 
BACH 2 locus revealed that the structure-disruptive SNP 
rs3757247 significandy correlated (r^ = 0.949) with the TID risk 
SNP rsl 1 755527, as well as in strong correlation (r^ = 0.565) with 
the IBD risk SNP rsl 847472 (Figure 4). 

ENCODE annotation and c/s-eQTL mapping of structure- 
disruptive SNPs in IBD and TID loci-associated IncRNAs 

Disease associated SNPs are highly enriched within the 
ENCODE-defined non-coding functional element regions. In 
many cases, these SNPs are known to have regulatory functions, 
and some are associated with bona fide eQTLs [37]. We retrieved 
ENCODE annotation for 362 and 1 78 structure-disruptive SNPs 
in IBD and TID loci-associated IncRNAs using RegulomeDB 
database [43]. We found, 254 out of 362 and 143 out of 178 
structure-disruptive SNPs in IBD and TID loci- associated 
IncRNAs were associated with transcription factor (TF) binding, 
eQTLs, DNase peak and therefore were likely to affect the 
binding. 

We retrieved pre-computed significant cw-eQTLs from 9 tissues 
(adipose subcutaneous, artery tibial, heart left ventricle, lung, 
muscle skeletal, nerve tibial, skin exposed sun, thyroid and whole 
blood) tested in more than 80 samples using a cis window of +/ — 
1 MB around the transcription start site (TSS) for the structure- 
disruptive SNPs in IBD and TID loci-associated IncRNAs using 
GTEx [44]. Out of 362 structure-disruptive SNPs from IBD loci- 
associated IncRNAs, only 94 SNPs had associated cis-eQTLs 
(Table SI). In case of 178 structure-disruptive SNPs in TID loci- 
associated IncRNAs, only 28 SNPs had CM-eQTLs. We also 
examined the gene-SNP association patterns for the structure- 
disruptive SNPs within IBD and TID loci-associated liuRNAs, 
and found significant associations for these SNPs. For (example, we 
observed significant association of the [:omm()n structure-disrup- 
tive SNP (rs3757247) for both IBD and TID loci-associated 
hicRNA NONHSAG044354 with IBD and TID candidate gene 
BACH 2 only in the whole blood (Figure 5). We also observed 
significant tissue-specific CM-eQTL signals associated with P4HA2, 
P4HA2-AS1, and AC063976.6 genes in thyroid, SLC22A5 gene 
in whole blood, lung and skin sun exposed and AC034220.3 gene 
in whole blood (Table SI). 

IBD and TID loci-associated IncRNAs with structure- 
disruptive SNPs under recent selection pressure 

A number of studies report that many genes exhibit very strong 
signals of recent selection in favor of new alleles. These regions of 
recent positive selection indicate the presence of genetic variants 
that are a source of significant phenotypic variation [45] . Indeed, it 
is known that many of these variants affect complex phenotypes of 



clinical relevance [46-48]. Based on selective screening of 
Haplotter and HapMap [45,49] datasets, we identified structure- 
disruptive SNPs located within the IBD and TID loci- associated 
IncRNAs that are under very recent selection in favor of new 
alleles, and have not yet reached fixation. Out of 362 structure- 
disruptive SNPs in IBD loci-associated IncRNAs, 1 2 SNPs (present 
within 14 IBD loci- associated IncRNAs) were found to be under 
recent positive selection in CEU, ASI, ASN and YRI populations 
respectively. In case of 178 structure-disruptive SNPs in TID loci- 
associated IncRNAs, 13 SNPs (present witiiin 14 TID loci- 
associated IncRNAs) in CEU and YRI populations were found to 
be under recent positive selertion (Table SI). The presence of 
structure disruptive SNPs under recent positive selection within 
IBD and T 1 D loci-associated IncRNAs, provides evidence for the 
functional roles of these variants. 

Based on RegulomeDB score, cw-eQTLs, recent positive 
selection and RNAsnp p-value (p-value < = 0.2), we subsequentiy 
ranked 362 and 178 structure-disruptive SNPs of IBD and TID 
loci-associated IncRNAs respectively (Table 3 and SI). Structure 
disruption caused by the top ranked SNPs rs2227319 and 
rs243327 witiiin die IBD and TID loci-associated IncRNAs 
respectively along with their CM-eQTLs is described in Figure S6 
and S7. 

Expression profiles of IBD and TID loci-associated 
IncRNAs 

In-depth systematic analysis of IncRNA expression in multiple 
human tissues has revealed that IncRNAs are usually lower 
expressed than protein-coding genes, and exhibit distinct tissue 
and developmental stage-specific expression patterns [50]. In our 
datasets, we explored the expression profiles of IBD and TID loci- 
associated IncRNAs using Human BodyMap (HBM) data. We 
observed a number of IBD and TID loci-associated IncRNAs 
expressed across all the HBM tissues using a threshold of > 1 
FPKM (Figure 6, S8 and S9). Based on this tiireshold, 251 out of 
4272 and 79 out of 816 IBD and TID loci- associated IncRNAs 
respectively were found to be expressed across all the HBM. 
Whereas, for the sense exonic/non-exonic IncRNAs mapping 
100% within the IBD and TID candidate genes, 155 out of 2133 
and 68 out of 611 IBD and TID loci-associated ln[:RNAs 
respectively were found to be expressed across all HBM tissues 
(Figure S8 and S9). In case of the antisense IBD and TID loci- 
associated IncRNAs, only 49 out of 1440 and 37 out of 317 were 
expressed across all tissues, respectively (Figure 6). We also 
calculated spearman correlations for the IncRNAs expressed 
across all the tissues to identify tissues that clusters together based 
on highly similar expression patterns. For the antisense IncRNAs, 
we observed diverse patterns of expression for IBD and TID loci- 
associated IncRNAs (Figure 7). In case of the sense exonic/ non- 
exonic and intergenic IncRNAs, we observed significant differ- 
ences in expression patterns across the tissues for IBD and TID 
loci-associated IncRNAs (Figure SIO and SI 1). In addition, for the 
IBD loci-associated IncRNAs, on average 72% of IncRNAs were 
not detected in any tissue, while in case of the TID loci- associated 
IncRNAs, 61% of the IncRNAs were not detected in any of the 
tissues at an FPKM threshold of >1 (Table S2). 

Furthermore, from our analysis, we observed a noticeable 
tissue-specific differential expression patterns in IncRNAs and their 
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g § associated protein-coding genes across an array of 14 tissues. We 

observed difTerential expression profile of IBD and TID loci- 



-S associated IncRNA NONHSAG044354 and its associated candi- 

— i "o 

S I date gene BACH 2 in most of the HBM tissues except in adipose, 

colon, and prostrate (Figure 8A). We also found tissue-specific 
differential expression for two other common IBD and TID 



rs 



K 2i 
i_ C TO 

-£ <N ^ candidate senes FUT2 and H0RMAD2 and their associated 



u ^ 3 IncRNAs (Figure S12A and 12C). In addition, we observed 

;5 g 9 positive spearman correlations for BACH2, FUT2 and HOR- 

oi Q MAD2 candidate genes with their associated IncRNAs (Figure 8B, 

Q.s^ S12B and SI 2D). Sense exonic IncRNA NONHSAG044354 

!^ S .s positively correlated (r, = 0.85) with its associated candidate gene 

1 ° BACH 2 (Figure 8B). However, in the case of intergenic antisense 
ill IncRNA NONHSAG033653 we observed a weaker positive 
^ " % correlation (r^ = 0.67) with its associated candidate gene HOR- 
-^l"g MAD2 (Figure S12B). Whereas, antisense IncRNA NON- 
-g ^ ^ HSAG026183 highly correlated (r, = 0.95) with its associated 
3 5 § candidate gene FUT2 (Figure SI 2D). These data are in 
|-5 — concordance with the recent findings of GENCODEv7 IncRNA 
c g >£ catalog study in which similar specific enrichment of positive 

2 ffi correlations of IncRNAs intersecting protein coding exons in 
_g g S antisense orientation with the mRNA host was also reported. 

y ^ c 

s 1 1 Discussion 

Genome-wide association studies (GWAS) have provided an 

o S excellent platform in pursuit of identifying common risk variants 

0) 5; g underlying many polygenic diseases [32], with unprecedented 
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success especially in several autoimmune diseases [33]. Studies 
have suggested that comparative meta-analysis approach is an 
excellent tool for dissecting the autoimmune diseases risk loci, and 
it could help gain valuable insights into understanding the 
functional roles of the genes that are shared between these 
^ 1 diseases, and uncover novel disease-loci relationships [51-53]. 

° IC c More than 50% of IBD susceptibility loci overlap with other 

E ^ o inflammatory and autoimmune diseases including TID. Most of 

S ^ ^ the risk variants map to the regulatory non-coding regions of the 

5j I S genome including IncRNAs - which have recently emerged as key 

players in regulating fundamental cellular processes. LncRNAs are 
known to act as decoys and forestall the access of DNA-binding 
I o' ^ proteins such as transcription factors to the DNA. A class of 

- ~ distinctively expressed lincRNAs can make cell-type-specific 

"flexible scaffolds" and, these scaffolds in turn interact with the 



■ 00 



■ ai O 



< 



u g g regulatory protein complexes and alter cell-type-specific gene 

; <u 

-P '5 3! 



K. 

■ S u 

1 5 S expression programs [54]. 

Furthermore, recent studies have revealed significant similarities 
between IncRNAs and 3' untranslated regions (3' UTRs) in terms 
□ Era of their structural features and sequence composition [55]. 

Ill LncRNAs and 3 'UTRs exhibit lower GC content of 42% and 

■is 5 43% respectively as compared to the protein-coding RNAs 

q£| (51.8%). The lower GC content in IncRNAs and 3'UTR 

!^ 2 1^ sequences indicate that they might contain fewer stably base- 

S ^ g paired structures making their primary sequence more accessible 

S ^ 2 ? for interactions with cellular factors [55]. However, in our analysis, 

^ !5 we found GC content of TID loci- associated IncRNAs to be 

0 c 2 comparatively higher (48%) than the IBD loci-associated IncRNAs 

1 I 1 I (43.5%) and total NONCODEv4 human IncRNAs (42%). Several 
:§ 3 _§■ Studies have reported association of the GC content with various 

E S o i intrinsic genomic features such as distribution of repeat elements 

I s ^ .2, [42]. Interestingly, in case of TID loci-associated IncRNAs, we 

Q, ^ 5;^ observed over-representation of SINE elements as compared to 

the IBD loci-associated IncRNAs and all human IncRNAs. 
Collectively, we found approximately 82% of human IncRNAs 
harbor repetitive elements, and out of these around 92% harbor 
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Figure 4. Regional LD plot for SNP rs3757247 associated with IBD and T1D loci. The regional LD plot for SNP rs3757247 was calculated 
using HapMapS (release 2) in SNAP tool. SNPs are represented as diamonds and the brightness of each SNP is proportional to the threshold value 
for that SNP. SNPs rsl 1 755527 and rsl 847472 had an r2 value of 0.949 (T1 D p-value^eta 5.38e-08, Barrett et al 2009 [52]) and 0.565 (IBD p-valueichip 
1.19e-04, IBD p-valuemeta 1.10e-09, Jostins et al 2012 [71]) respectively (shown in red). 
doi:10.1371/journal.pone.0105723.g004 



interspersed repeat elements. SINE elements such as Alu's have 
been implicated in the gene and miRNA regulation. Alu's within 
the 3'UTR have been found to be involved in Staufenl (STAUl) 
mediated mRNA decay (SMD) [56] . Base-pairing between Alujo 
(100 nt) m the 3'-UTR of SERPINEl gene and AluSx (300 nt) in 
3'UTR of FLJ21870 gene with another Alu repeat located within 
a cytoplasmic polyadenylated IncRNA (AF087999) forms RNA- 
RNA duplexes that are targets for SMD [56]. Notably, the over- 
representation of Alu repeat elements within the TID loci- 
associated IncRNAs suggest that these IncRNAs might be involved 
in similar regulatory functions based on RNA-RNA interactions. 

Emerging lines of evidence suggests that expression dysregula- 
tion and large- and small-scale mutations in primary sequence of 
IncRNAs are strongly linked with phenotype changes and disease 
susceptibility. Genetic mutations are transmitted to the transcrip- 
tome during the IncRNA transcription events and could possibly 
perturb the IncRNA functions particularly if such mutations are 
located within the embedded regulatory motifs of IncRNAs. 
Indeed, various studies have established that genetic variations 
present within ncRNAs could potentially alter their functions 
predominantly by inducing changes in their folding patterns, 
secondary structure stability and affect expression. For example, 
multiple SNPs within 5 ' and 3 ' untranslated regions (UTRs) of the 
gene altered the mRNA structural ensemble of the-associated 
genes for six disease-states (hyperferritinemia cataract syndrome, 
beta-thalassemia, cartilage-hair hypoplasia, retinoblastoma, chron- 
ic obstructive pulmonary disease (COPD), and hypertension) [38]. 
Furthermore, specific GWAS SNPs in and around an antisense 
IncRNA ANRIL have been shown to alter the transcription and 
processing of ANRIL transcripts which are associated with 



increased susceptibility to coronary disease, intracranial aneurysm, 
type 2 diabetes, as well as several tumor types [24,57,58], 
Therefore, exploring structural impact of these genetic variations 
within IncRNAs may provide additional insights and evidence 
regarding their functionality. Indeed, our analysis also identified 
highly significant structure-disruptive SNPs within IBD and TID 
loci-associated IncRNAs that might be related with altered 
transcription levels of the IncRNAs and their associated protein 
coding genes. 

Genetic variations can strongly affect the gene expression 
phenotype and various studies have also reported tissue-specific 
eQTLs [59,60]. Recently, a number of disease- or trait- associated 
SNPs were found to be tissue-dependent lincRNA ci.? -eQTLs [36]. 
We also identified tissue-specific IncRNA c«-eQTLs associated 
with structure-disruptive SNPs. For example, in case of sense 
IncRNA NONHSAG044354 associated with IBD and TID 
candidate gene BACH2, we observed significant tissue-specific 
cis-eQTL signal but only in whole blood based on gene SNP 
association for the structure-disruptive SNP rs3757247 and 
BACH 2. The propensity of variation in gene expression patterns 
observed between and within species is strongly determined by the 
adaptive changes in the gene regulation mechanisms [61]. Indeed, 
cQTL signals frequendy map to the regions of the human genome 
that have undergone recent positive selection [62]. Interestingly, 
from our analysis we also identified a number of IncRNA cis- 
eQTLs under recent positive selection. For example, the antisense 
IncRNA NONHSAG043608 associated with TID candidate 
genes SYNGAPl and ZBTB9, we found signals for recent positive 
selection in YRI population and significant tissue-specific cis- 
eQTL associated with HLA-DPBl gene in thyroid tissue (Table 
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Figure 5. <r/>eQTLs and gene-SNP associations for rs3757247 and rs597325 with a4C//2 candidate gene. (A) The gene SNP association 
for rs3757247 and BACH2 candidate gene in whole blood. (B, C, D) The gene SNP association for rs597325 and BACH2 in colon transverse, brain cortex 
and cell line fibroblasts. The gene SNP associations are calculated using linear regression model using Genotype Tissue Expression Portal (GTEx) in the 
selected tissues with more than 80 samples using a c/s window of +/-1 MB around the transcription start site (TSS) at significance level of 0.05. For 
each gene SNP association plot, p-values are displayed at the bottom. 
doi:1 0.1 371/journal.pone.01 05723.g005 



SI). Whereas, in case of the antisense NONHSAG041 519 and 
sense NONHSAG041520 IncRNAs associated with IBD candi- 
date genes SLC22A4 and SLC22A3,\ve found strong signals for 
recent positive selection in CEU population (Table SI). 

Gene expression is a tightly controlled phenomenon and is 
differentially regulated across tissues and cell types. In case of 



lincRNAs, studies have demonstrated their spatiotemporal specific 
expression [3,63,64]. Expression analysis of IBD and TID 
candidate genes and their associated IncRNAs using HBM dataset 
revealed diverse levels of tissue-specific expression patterns across 
all the tissues. LncRNAs intersecting exons of protein coding genes 
in sense and antisense orientations are known to be enriched for 
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Table 3. Structure-disruptive SNPs and their-associated IncRNAs. 



structure-disruptive SNPs 


IBD 


T1D 


Structure-disruptive SNPs (P-val < = 0.2) 


362 


178 


LncRNA genes harboring structure-disruptive SNPs 


192 


102 


Total IBD/T1D candidate genes harboring structure-disruptive SNPs 


124 


94 


Protein-coding genes harboring structure-disruptive SNPs 


118 


63 


Structure-disruptive SNPs with annotated ENCODE data using RegulomeDB 


259 


143 


Structure-disruptive SNPs having c/s-eQTLs 


94 


28 


Structure-disruptive SNPs under recent positive selection 


12 


13 


Candidate IncRNAs under recent positive selection harboring structure-disruptive SNPs 


14 


14 



Total numbers of structure-disruptive SNPs are reported after ranking based on RegulomeDB score, c/s-eQTLs and recent positive selection. 
doi:l 0.1 371 /journal.pone.01 05723.t003 



positive correlations with their mRNA host [50] . We also observed 
strong positive correlations for IncRNAs intersecting IBD and 
TID candidate genes. For example, the IBD and TID loci- 
associated IncRNAs NONHSAG044354 and NONHSAG026183 
intersecting candidate genes BACH 2 and FUT2 respectively 
showed strong positive correlations with their host mRNA 
expression levels (Figure 8 and SI 2). These findings suggest that 
expression of the host mRNA could be regulated by their 
intersecting sense and antisense IncRNAs. BACH2 is one of the 
common candidate gene implicated in multiple inflammatory 
diseases including IBD and TID and recentiy, it has been shown 
to regulate the CD4-H T-cell differentiation that in turn averts the 
inflammatory disease by maintaining the balance between 
tolerance and immunity [65]. In our analysis we observed a 
strong LD signal between BACH2 associated IncRNA {NON- 
HSAG044354) structure-disruptive SNP rs3757247 and TID risk 
SNP rsl 1755527 and also with IBD risk SNP rsl847472 in CEU 
population (Figure 4). SNP rsl 1755527 has also been reported to 
be strongly associated with the thyroid peroxidase autoantibodies 
(TPOA) [66] . In a recent study, disease associated SNP pairs that 
are in high LD were found to form structure stabilizing haplotypes 
in UTRs of the human genome [67]. Notably, two of the SNPs 
from our ranked list of structure-disruptive SNPs in IBD loci- 
associated IncRNAs (Table SI) were also reported to alter the 
ensemble of RNA structures in the above study. Both of these 
SNPs rsl050088 [DAGl 3'UTR variant) andrs2836723 (lincRNA 
AF064858 exon variant) are known IBD associated variants 
(GWAS p-value 7.34e-12 and 2.94e-05 [71]). Taken together, 
these data further suggest that specific pairs of IncRNA associated 
SNPs that are in high LD could form RNA structure-stabilizing 
haplotypes altering the ensemble of structures adopted by the 
mRNA. Nevertheless, genetic variation driven IncRNA structure 
changes alone may not be the only underlying mechanism 
manifesting disease phenotype, but rather an outcome from an 
array of cumulative molecular aberrations including loss of 
regulatory binding sites coupled with expression dysregulation 
that together defines the phenotype. 

Materials and Methods 

Data retrieval 

We retrieved protein coding and non-coding genes within the 
IBD and TID regions (referred as candidate genes in this study). 
These IBD and TID regions are based on genome-wide 
association studies (p<5e-08) or have attained significant associ- 



ation in a candidate gene study. Inflammatory bowel disease (IBD) 
candidate genes were retrieved from the IBDsite database [68]. 
IBDsite database contains data related to the biomolecular 
mechanisms-associated with the onset of IBD and it also hosts 
the human and bacterial information related to Crohn's disease 
(CD) and Ulcerative colitis (UC). A total of 1432 IBD human 
candidate genes were retrieved and filtered to remove their 
isoforms and antisense transcripts. After filtering, the IBD dataset 
included 1333 IBD candidate genes. TID regions were retrieved 
from TlDbase (version 4.15) [69]. TlDbase is a web based 
resource focused on the genetics and genomics of type 1 diabetes 
susceptibility (TID) in mouse, rat and human. In total, 899 TID 
candidate genes were retrieved from the TlDbase. Human long 
non-coding RNAs (IncRNAs) were retrieved from NONCODEv4 
database [70]. NONCODE is an integrated database of all types 
of noncoding RNAs (except tRNAs and rRNAs) obtained from 
various sources. The current database contains 95,135 (56,018 
IncRNA genes) human annotated IncRNA transcripts. Combined 
GWAS and ImmunoChip SNPs for IBD were retrieved from the 
International Inflammatory Bowel Disease Genetics Consortium 
(IIBDGC) database {http://www.ibdgenetics.org} [71]. This 
dataset is based on meta-analysis of GWAS datasets after 
imputation to the HapMap3 reference set, and replicated in the 
ImmunoChip data any SNPs with p<0.01. Meta-analyzed GWAS 
SNPs for TID were retrieved from TIDGC [52]. In case of the 
TID GWAS SNPs, a cut ofi' based on p-value<0.01 was used to 
filter nominally significant SNPs. Since coordinates of both 
datasets, were based on human genome build (GrCh36/hgl8), 
hence all the SNP coordinates were converted to the current build 
of the human genome (GrCh37/hgl9). 

IBD and TID loci-associated IncRNAs and GWAS/ 
ImmunoChip SNPs 

Human IncRNA gene coordinates from NONCODEv4 data- 
base were intersected with candidate gene coordinates of IBD and 
TID loci using intersectBed feature from the BedTools suite [72] 
to identify IncRNA genes located within IBD and TID loci. 
LncRNAs present within the proximity of 5kb up/ down-stream of 
the IBD and TID candidate genes were also identified. miRNA 
genes located within mapped IBD and TID loci-associated 
IncRNAs, and the GWAS/ImmunoChip and GWAS SNPs for 
IBD and TID loci were mapped to the IBD and TID loci- 
associated IncRNAs using the intersectBed. In our dataset, based 
on the genomic association of IncRNAs to the protein coding 
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Figure 6. Expression of antisense IBD and T1D loci-associated IncRNAs. Expression profiles for (A) 49 antisense IBD loci-associated IncRNAs 
and (B) 37 antisense TID loci-associated IncRNAs expressed across all the HBM tissues. The FPKM threshold of >1 was used and the values were log 10 
transformed. 

doi:1 0.1 371 /journal.pone.01 05723.g006 



genes, we categorized IncRNAs as sense exonic, sense non-exonic, 
antisense, and intergenic (Figure 2). 

Sequence analysis of IBD and TID loci-associated 
IncRNAs 

IBD and TID loci-associated IncRNAs were subjected to 
sequence analysis such as comparison of their length distribution, 
GC content and presence of various repeat elements. Repeatmas- 
ker version 4.0.2 [73] was used to compare the distribution and 



repeat classes within the IBD and TID loci-associated IncRNAs 
using RepBase library dated 2013-04-22. 

Structure-disruptive SNPs in IBD and TID loci-associated 
sense exonic/non-exonic and intergenic IncRNAs 

We employed RNAsnp [74] to predict the structural effects of 
GWAS SNPs within the IBD and TID loci-associated IncRNAs. 
RNAsnp focuses on the local regions of maximal structural change 
between wild-type and mutant and the mutation effects are 




Figure 7. Correlations of expression for antisense IBD and T1 D loci-associated IncRNAs. Spearman correlations for (A) antisense IBD and 
(B) antisense TID loci-associated IncRNAs expressed across all the HBM tissues. The FPKM threshold of >1 was used and the values were loglO 
transformed. 

doi:1 0.1 371/journal.pone.01 05723.g007 
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Figure 8. Comparison of expression levels of NONHSAG044354 with its host gene [BACH2) in 14 tissues. (A)Tissue-specific gene 
expression profile of BACH2 candidate gene and NONHSAG044354 associated IncRNA across 14 tissues based on Human Body Map (HBM) data. (B) 
Sense exonic IncRNA NONHSAG044354 positively correlating (r5 = 0.85) with BACH2 candidate gene. Protein coding mRNA expression is plotted on the 
X-axis and IncRNA expression is shown on the y-axis both on loglO FPKM for 14 HBM tissues. 
doi:10.1371/journal.pone.0105723.g008 



quantified in terms of empirical p-values. Moreover, RNAsnp uses 
extensive pre-computed tables of the distribution of SNP effects as 
function of sequence length, GC content and SNP position. For 
our analysis we used mode 1 option of the RNAsnp which employs 
global folding and a folding window of +/ — 200 nucleotides 
around the SNP position to calculate base pairing probability 
matrices for wild type and mutant subsequences. The difference 
between wild type and mutant base pair probability for the local 
regions is measured using Euchdean distance with the corre- 
sponding p-values. The local region detected with maximum 
structural change with p-value less than 0.2 was considered as 
significant structural change. All the SNPs passing the p-value 
threshold of <0.2 were selected for further downstream analysis. 

eQTLs, ENCODE annotation, LD, and tissue-specific 
expression of structure-disruptive SNPs in IBD and 11 D 
loci-associated IncRNAs 

To retrieve c/.s-eQTLs for structure-disruptive SNPs in IBD and 
TID loci-associated IncRNAs, we used Genotype-Tissue Expres- 
sion project resource (GTEx) (http://www.broadinstitute.org/ 
gtex/) [44]. For a query SNP, GTEx provides pre-computed 
significant CM-eQTLs from 9 tissues (adipose subcutaneous, artery 
tibial, heart left ventricle, lung, muscle skeletal, nerve tibial, skin 
sun exposed, thyroid and whole blood) with more than 80 samples 
using a cis window of +/ — 1 MB around the transcription start site 
(TSS) [44] . We also calculated eQTLs in selected tissues based on 
gene-SNP associations using GTEx for aU those structure- 
disruptive SNPs for which no pre-computed CM-eQTLs were 
available. 

All regulatory features and ENCODE annotations for structure- 
disruptive SNPs in IBD and TID loci-associated IncRNAs were 
retrieved from RegulomeDB database [43]. RegulomeDB anno- 
tates SNPs with known and predicted regulatory elements in 
intergenic regions of human genome using datasets from GEO 
[75], ENCODE project [76], and published literature. 

Regional LD plots for selected SNPs were generated using SNP 
annotation and proxy search (SNAP) [77]. We selected HapMapS 
(release 2) SNP data-set and a distance limit of 500 kb. 



The tissue-specific expression profiles for all IBD and TID 
candidate genes and their associated IncRNA genes were retrieved 
using Human BodyMap 2.0 data (ENA archive: ERP000546) 
across 16 human tissues and NONCODEv4 [70]. Expression 
values were expressed as fragments per kHobase of exon per 
million reads (FPKMs), which is a measure of gene expression 
normalized to size of the gene and RNA-seq library size. 
Expression values (FPKMs) were log 10 transformed and graph- 
ically represented as heatmaps for IBD and TID loci-associated 
IncRNAs using an FPKM threshold of > 1 (Figure 6, S8 and S9). 

IBD and TID loci-associated IncRNAs under recent 
positive selection 

To identify loci-associated IncRNAs under recent positive 
selection, we focused on structure-disruptive SNPs present within 
the IBD and TID loci- associated IncRNAs. We took leverage of 
Haplotter [45], a tool based on HapMap project data [49]. 
Haplotter is based on integrated Haplotype Score (iHS), a statistic 
used to detect evidence of recent positive selection at a locus, and 
covers iHS data for three populations, ASN (combined Japanese 
from Tokyo, Japan and Han Chinese from Beijing, China), CEU 
(Utah residents with Northern and Western European ancestry 
from the CEPH collection) and YRI (Vorubans from Ibadan, 
Nigeria). Only SNPs with extreme iHS scores (iHS > = 2.5 or 
iHS< = (-2.5)) were retrieved [45]. The IBD and TID loci- 
associated IncRNAs that are under recent positive selection were 
identified based on the above described criteria. 

Supporting Information 

Figure SI Log 10 transformed length distribution of 
Noncodev4 human IncRNA genes. 

(TIFF) 

Figure S2 Comparison of length distribution of IBD and 
TID loci-associated IncRNA genes. IBD and TID loci- 
associated IncRNAs revealed significant differences in their 
average lengths as compared to the total IncRNAs. 
(TIFF) 
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Figure S3 Comparison of GG content of IBD and TID 
loci-associated IncRNA genes. Significantly higher GC 
content in TID loci-associated IncRNA genes (p-value < lOe-6, 
Welch two sample t-test) was observed as compared to the 
background (total IncRNAs). 
(TIFF) 

Figure S4 Distribution of repeat elements in Noncodev4 
human IncRNA genes. 

(TIFF) 

Figure S5 Relative abundance of interspersed repeat 
elements within IBD and TID loci-associated IncRNA 
genes. Significant difiFerences in the interspersed repeat element 
distributions for both IBD and TID loci-associated IncRNAs were 
observed (p-value < lOe-6, Chi-square goodness of fit test) as 
compared to the background (total IncRNAs). 
(TIFF) 

Figure S6 Top two structure-disruptive SNPs within 
IBD and TID loci-associated IncRNAs (ranked based on 
ReguIomeDB score and RNAsnp p-value). (A) SNP 

rs2227319 (structure-disruptive SNP within IBD loci-associated 
IncRNA NONHSAG021725). (B) SNP rs243327 (structure- 
disruptive SNP witiiin TID loci-associated IncRNA NON- 
HSAGO 18599). 
(TIFF) 

Figure S7 cis-eQTLs for the top two structure-disrup- 
tive SNPs within IBD and TID loci-associated IncRNAs 
(ranked based on ReguIomeDB score and RNAsnp p- 

value). (A) cis-eQTLs for structure-disruptive SNP rs2227319 
within IBD loci-associated IncRNA NONHSAG021725. (B) cis- 
eQTLs for structure-disruptive SNP rs243327 within TID loci- 
associated IncRNA NONHSAGO 18599. For each gene SNP 
association plot, p-values are displayed at the bottom. For a query 
SNP, GTEx provides pre-computed significant CM-eQTLs from 9 
tissues (adipose subcutaneous, artery tibial, heart left ventricle, 
lung, muscle skeletal, ner\'e tibial, skin sun exposed, thyroid and 
whole blood) with more than 80 samples using a cis window of 
+/— \ MB around the transcription start site (TSS). 
(TIFF) 

Figure S8 Expression levels for intergenic (A) and sense 
exonic/non-exonic (B) IBD loci-associated IncRNAs 
expressed across all HBM tissues (at FPKM threshold 
of >1). The FPKM threshold of > 1 was used and the values were 
log 10 transformed. 
(TIFF) 

Figure S9 Expression levels for intergenic (A) and sense 
exonic/non-exonic (B) TID loci-associated IncRNAs 
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